







WHY MESH ADAPT ATI 
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• Need frequent adaptations when solving unsteady 
problems 

• Overhead must be low compared to solver 


UNSTRUCTURED GRIDS 






POWER OF MESH ADAPTATION 
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0.25l\Zfcells using fine cells in reaction zone 
Resource requirements reduced by factor of 5000 


WHY DYNAMIC LOAD BALANCING? 






PARALLEL ADAPTIVE COMPUTATIONS 





PLUM 



• Efficient data movement scheme (like BSP) 

• Metrics to estimate computational gain and 
communication overhead 



DUAL GRAPH OF INITIAL MESH 
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Gives^global view of entire computational mesh 
Child elements belong to same partition as parent 
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SIMILARITY MATRIX CONSTRUCTION 


Matrix S indicates how remapping weights of new 
partitions distributed over processors 

Sy = sum of remapping weights of all dual graph 
vertices in new partition j already on processor i 

New Partitions 
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Use greedy Heuristic algorithm for quick approximation 

' # \ 

Allow multipartitioning (reduce data volume at the cost 
of partitioning and reassignment times) 


PROCESSOR REASSIGNMENT 



(SoIvq optimally as DBMCM problem in 0(V ' E logV)) 

Heuristic algorithm gives suboptimal solution in 0(E) 
(provable bounds on quality) 



DIFFUSIVE METHODS 
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TYPICAL PETAFLOPS SYSTEM FEATURES 


• Large number of processors 
(concurrency, processor-to-processor latency) 

• Deep memory hierarchies 

(data locality, processor-to-memory latency) 

• Programming paradigm? 

(message passing, shared memory, multithreading) 



DATA DECOMPOSITION STRATEGIES 
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Easier to program? 

Need sufficient computational work to mask absence 
of data locality 



SCALABILITY ISSUES 



LATENCY HIDING 





MEMORY HIERARCHY 


3 

O 

O 

U) 


.Q 0 
O O 


75 I 


5 - 0 

O Q. 


0 c o 
o c 
0 i= 

-Q ® 

$2 "co 
o Q - 

■4= CO 
CO CO 
O CD 

— o 

Q. O 
Q. CO 

0 CTi 


U) 

c CO 


b "f 
=3 3 
£3 O) 

0 0 

3 — 
0 "5 
0 

0 'p 

■3 E 
0 0 
c 

o ^ 

H- X 3 


o O o 


o 2 


N j? 

u. _ 
0 0 
0 > 


0 .hz 
> 3 

c g 

CO < 


D O Z> 


CD 2= 
CO CD 

D CO 


o - 

C CO 


CO CO 

<D « 
N CO 

m © 

E 2 

CD Q. 

■Q E 

o — 

O CD 

i “ L o 

TD c 
CD CO 

^ E 


CD 2 

S 5 ° 

CO CO 

c w 
m < 1 > 
^ o 

TD 2 
< CL 



memory hierarchy 



ELF-AVOIDING WALKS 



applications (issues related to locality and load 
balancing) 

• Now in 2D, easily extensible to 3D 


PROPER SELF-AVOIDING WALKS 



• Existence can be proved for any unstructured mesh 
(by induction that is also a construction process) 



CONSTRAINED PSAW 


Generate PSAW for initial mesh (global) 

Translate footprint to a boundary-vaiue problem 

Exploit regularity of refinement rules to restrict CPSAW 
to triangle (local) 

Method inherently parallel 




CONCLUSIONS 
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Feasible, provided some significant research 
challenges can be overcome 


